我们提出了一个新颖的框架,以研究异步联合学习优化,并在梯度更新中延迟。我们的理论框架通过引入随机聚合权重来表示客户更新时间的可变性,从而扩展了标准的FedAvg聚合方案,例如异质硬件功能。我们的形式主义适用于客户具有异质数据集并至少执行随机梯度下降(SGD)的一步。我们证明了这种方案的收敛性,并为相关最小值提供了足够的条件,使其成为联邦问题的最佳选择。我们表明,我们的一般框架适用于现有的优化方案,包括集中学习,FedAvg,异步FedAvg和FedBuff。这里提供的理论允许绘制有意义的指南,以设计在异质条件下的联合学习实验。特别是,我们在这项工作中开发了FedFix,这是FedAvg的新型扩展,从而实现了有效的异步联合训练,同时保留了同步聚合的收敛稳定性。我们在一系列实验上凭经验证明了我们的理论,表明异步FedAvg以稳定性为代价导致快速收敛,我们最终证明了FedFix比同步和异步FedAvg的改善。
translated by 谷歌翻译
虽然客户的采样是当前最先进的联邦学习(FL)方法的核心运营,但该程序对迄今为止的迄今为止迄今为止的收敛和速度的影响。在这项工作中,我们为FL的收敛介绍了一种新颖的分解定理,允许清楚地量化客户对全局模型更新的影响。与之前的收敛分析相反,我们的定理提供了给定的收敛步骤的精确分解,从而能够准确考虑客户端采样和异质性的作用。首先,我们为先前报告的结果提供了一种理论基础,从收敛性与聚集权重之间的关系之间的关系。其次,我们首次证明了FL收敛的质量也受到聚集重量之间产生的协方差的影响。第三,我们建立了聚集权重的总和是另一个减速的来源,应该等于1来提高流动速度。我们的理论是一般性的,这里申请了多项分布(MD)和统一采样,在FL中的两个默认客户端采样,并通过一系列非IID和不平衡情景进行了演示。我们的结果表明,MD采样应用作默认采样方案,因为在学习过程中的数据比变化的恢复,而统一的采样仅在客户端具有相同数量的数据时才是优越的。
translated by 谷歌翻译
This short study reformulates the statistical Bayesian learning problem using a quantum mechanics framework. Density operators representing ensembles of pure states of sample wave functions are used in place probability densities. We show that such representation allows to formulate the statistical Bayesian learning problem in different coordinate systems on the sample space. We further show that such representation allows to learn projections of density operators using a kernel trick. In particular, the study highlights that decomposing wave functions rather than probability densities, as it is done in kernel embedding, allows to preserve the nature of probability operators. Results are illustrated with a simple example using discrete orthogonal wavelet transform of density operators.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.
translated by 谷歌翻译
We study inductive matrix completion (matrix completion with side information) under an i.i.d. subgaussian noise assumption at a low noise regime, with uniform sampling of the entries. We obtain for the first time generalization bounds with the following three properties: (1) they scale like the standard deviation of the noise and in particular approach zero in the exact recovery case; (2) even in the presence of noise, they converge to zero when the sample size approaches infinity; and (3) for a fixed dimension of the side information, they only have a logarithmic dependence on the size of the matrix. Differently from many works in approximate recovery, we present results both for bounded Lipschitz losses and for the absolute loss, with the latter relying on Talagrand-type inequalities. The proofs create a bridge between two approaches to the theoretical analysis of matrix completion, since they consist in a combination of techniques from both the exact recovery literature and the approximate recovery literature.
translated by 谷歌翻译
We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.
translated by 谷歌翻译
When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.
translated by 谷歌翻译
Deep Neural Networks (DNNs) outshine alternative function approximators in many settings thanks to their modularity in composing any desired differentiable operator. The formed parametrized functional is then tuned to solve a task at hand from simple gradient descent. This modularity comes at the cost of making strict enforcement of constraints on DNNs, e.g. from a priori knowledge of the task, or from desired physical properties, an open challenge. In this paper we propose the first provable affine constraint enforcement method for DNNs that requires minimal changes into a given DNN's forward-pass, that is computationally friendly, and that leaves the optimization of the DNN's parameter to be unconstrained i.e. standard gradient-based method can be employed. Our method does not require any sampling and provably ensures that the DNN fulfills the affine constraint on a given input space's region at any point during training, and testing. We coin this method POLICE, standing for Provably Optimal LInear Constraint Enforcement.
translated by 谷歌翻译
Shapley values are ubiquitous in interpretable Machine Learning due to their strong theoretical background and efficient implementation in the SHAP library. Computing these values previously induced an exponential cost with respect to the number of input features of an opaque model. Now, with efficient implementations such as Interventional TreeSHAP, this exponential burden is alleviated assuming one is explaining ensembles of decision trees. Although Interventional TreeSHAP has risen in popularity, it still lacks a formal proof of how/why it works. We provide such proof with the aim of not only increasing the transparency of the algorithm but also to encourage further development of these ideas. Notably, our proof for Interventional TreeSHAP is easily adapted to Shapley-Taylor indices and one-hot-encoded features.
translated by 谷歌翻译
尽管自我监督学习(SSL)方法取得了经验成功,但尚不清楚其表示的哪些特征导致了高下游精度。在这项工作中,我们表征了SSL表示应该满足的属性。具体而言,我们证明了必要和充分的条件,因此,对于给出的数据增强的任何任务,在该表示形式上训练的所需探针(例如,线性或MLP)具有完美的准确性。这些要求导致一个统一的概念框架,用于改善现有的SSL方法并得出新方法。对于对比度学习,我们的框架规定了对以前的方法(例如使用不对称投影头)的简单但重大改进。对于非对比度学习,我们使用框架来得出一个简单新颖的目标。我们所得的SSL算法在标准基准测试上的表现优于基线,包括Imagenet线性探测的SHAV+多螺旋桨。
translated by 谷歌翻译